查看原文
其他

目录 | DIM导览(第五卷第三期)

DIM编辑部 DIM数据与信息管理 2022-06-09

数据与信息管理

ISSN 2543-9251

Volume 5, Issue 3



尊敬的读者,Data and Information Management(DIM)第五卷第三期网刊已经发布,请登录以下网址免费浏览:

https://content.sciendo.com/view/journals/dim/dim-overview.xml





大数据时代的知识实体抽取与文本挖掘


Editorial


Knowledge Entity Extraction and Text Mining in the Era of Big Data

Authors: Chengzhi Zhang, Philipp Mayr, Wei Lu and Yi Zhang





发现蓬勃发展的生物实体及其与基金的关系


Research Article


Discovering Booming Bio-entities and Their Relationship with Funds

Authors: Fang TanTongyang ZhangSiting YangXiaoyan Wu and Jian Xu

Abstract: With the increasing pressure on the National Institutes of Health (NIH) budget nowadays, it is such a major challenge to cut waste and improve efficiency in the research funding allocation. To meet this challenge, this paper explores research hotspots and disciplinary trends of the biomedical area, and discusses the relationship between these factors and the government funding, thereby uncovering biomedical hotspots of interest to academia and the evolution law of the U.S. federal government funding through an entitymetrics analysis. Considering that the rapid proliferation of biomedical literature provides large amounts of information resources for knowledge discovery, entities extracted from articles in PubMed and NIH-funded projects during 1988–2017 are taken as experimental data. They are divided into four categories: species, diseases, genes, and drugs. Subsequently, a comparative analysis of entity trajectories in the four domains is performed, which includes occurrence frequency calculations of disease entities to explore frequency variation trends in high-frequency entities and the situation of the distribution of research funds. Finally, we conduct an evolutionary analysis of two sides, respectively: the relationship between research popularity and the amount of funding; the relationship between research popularity and the number of funded projects. The results suggest that research on gene and disease entities is at the stage of rapid development. Diseases with high prevalence rate and mortality and diseases associated with genetic factors will be the emphasis of research trends in the future. The distribution of NIH grant appears obvious long tail effect and can influence overall trends in the heat of research topics. We also find that there is a strong linear correlation between the research popularity of bio-entities, and the amount and number of funding grants, respectively. However, the impact of the amount and number of grant funds on the entity research popularity is decreasing. The above results indicate the extensive applicability of entitymetrics in funding research.

Keywords: entity metrics; research funds; biomedicine; evolutionary analysis




一种用于科学文本术语抽取的Pattern和POS自动学习方法


Research Article


A Pattern and POS Auto-Learning Method for Terminology Extraction from Scientific Text

Authors: Wei Shao,Bolin Hua and Linqi Song

Abstract: A lot of new scientific documents are being published on various platforms every day. It is more and more imperative to quickly and efficiently discover new words and meanings from these documents. However, most of the related works rely on labeled data, and it is quite difficult to deal with unlabeled new documents efficiently. For this, we have introduced an unsupervised method based on sentence patterns and part of speech (POS) sequences. Our method just needs a few initial learnable patterns to obtain the initial terminology tokens and their POS sequences. In this process, new patterns are constructed and can match more sentences to find more POS sequences of terminology. Finally, we use obtained POS sequences and sentence patterns to extract terminology terms in new scientific text. Experiments on paper abstracts from Web of Knowledge show that this method is practical and can achieve a good performance on our test data.

Keywords: auto-learning; terminology extraction; unsupervised method; scientific text




电子政务中的公共信息主题自动分类


Research Article


Automatic Subject Classification of Public Messages in E-government Affairs

Authors: Pei Pan andYijin Chen

Abstract: Public messages on the Internet political inquiry platform rely on manual classification, which has the problems of heavy workload, low efficiency, and high error rate. A Bi-directional long short-term memory (Bi-LSTM) network model based on attention mechanism was proposed in this paper to realize the automatic classification of public messages. Considering the network political inquiry data set provided by the BdRace platform as samples, the Bi-LSTM algorithm is used to strengthen the correlation between the messages before and after the training process, and the semantic attention to important text features is strengthened in combination with the characteristics of attention mechanism. Feature weights are integrated through the full connection layer to carry out classification calculations. The experimental results show that the F1 value of the message classification model proposed here reaches 0.886 and 0.862, respectively, in the data set of long text and short text. Compared with three algorithms of long short-term memory (LSTM), logistic regression, and naive Bayesian, the Bi-LSTM model can achieve better results in the automatic classification of public message subjects.

Keywords: Internet politics inquiry; public message; subject classification; Bi-LSTM model; attention mechanism


Data and Information Management(DIM)是由武汉大学信息管理学院、武汉大学信息资源研究中心主办,国际知名学术出版商德古意特(De Gruyter)出版的开放获取(Open Access)英文期刊,旨在促进数据驱动下跨领域的信息管理研究,DIM免费向读者提供全文,并为作者提供全方位的增值服务和多元化的激励政策。除了享受高水准的同行评议、高品质的编辑加工、高集成的出版平台和高效率的网络传播,读者亦可免支付出版费。

在线投稿系统:
http://www.editorialmanager.com/dim/default.asp    


制版编辑:孙冉


往期回顾

目录 | DIM导览(第五卷第二期)

目录 | DIM导览(第四卷第二期)

目录 | DIM导览(第四卷第一期)

目录 | DIM导览(第三卷第三期)

目录 | DIM导览(第三卷第二期)

目录 | DIM导览(第三卷第一期)


DIM期刊编辑部https://content.sciendo.com/Tel: +48227015015

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存